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(57) Abstract 

A multi-layer network element (12) 
for forwarding received packets from an 
input port to one or more output ports 
(38) with quality of service; When output 
I queues (54) exceed or meet a threshold 
value below the queue's capacity pack- 
I ets are randomly discarded. When the 
I queue becomes full, the network element 
determines which flow caused the queue 
to overflow. The priority of that flow is 
lowered. In a multicast packet, die packet 
may have different priorities at each out- 
put port. Scheduling of multiple out- 
I put queues at each output port uses a 
weight round robin approach that allocates 
j a weight portion of packets to transmit at 
I each time interval. A packet is not inter- 
rupted during its transmission, even if the 
weight portion is met during a packet's 
I transmission. The excess number of bytes 
transmitted as a result of not interrupting 
the packet are accounted for in the next 
I round. 
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A SYSTEM AND METHOD FOR A 
QUALITY OF SERVICE IN 
A MULTI-LAYER NETWORK ELEMENT 

5 FIELD OF THE INVENTION 

The present invention relates in general to packet forwarding within a network 
and, in particular, to a system and method for forv^arding packets using multi-layer 
information. 

10 BACKGROUND OF THE INVENTION 

Communication between computers has become an imponant aspect of 
everyday life in both private and business environments. Networks provide a medium 
for this communication and further for communication between various types of 
elements connected to the nerwork such as servers, personal "tompuiers, workstations. 

15 memory storage systems, or any other component capable of receiving or transmitting 
data to or from the network. The elements communicate with each other using defmed 
protocols that define the orderly transmission and receipt of information. In general, 
the elements view the nerwork as a cloud to which they are anached and for the most 
pan do not need to know the details of the network architecture such as how the 

20 network operates or how it is implemented. Ideally, any network architecture should 
support a wide range of applications and allow a wide range of underlying 
technologies. The network architecture should also work well t"or very large networks, 
be efficient for small networks, and adapt to changing network conditions. 

NePA'orks can be generally be differentiated based on their size. At the lower 

25 end, a local area network (LAN) describes a network having, characteristics including 
multiple systems anached to a shared medium, high total bandwidth, low delay, low 
error rates, broadcast capability, limited geography, and a limited number of stations, 
and are generally not subject to post, telegraph, and telephone regulation. At the upper 
end, an enterprise network describes connections of wide area networks and LANs 
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connecting diverse business units within a geographically diverse business organization. 

To facilitate communication within larger networks, the networks are typically 
partitioned into subnetworks, each sharing some common characteristic such as 
izeographical location or functional purpose, for example. The partitioning serves two 

5 main purposes: to break the whole network down into manageable parts and to 

logically (or physically) group users of the network. Network addressing schemes may 
take such partitioning into account and thus an address may contain information about 
how the network is partitioned and where the address fits into the network hierarchy. 
For descriptive and implementive purposes, a network may be described as 

10 having multiple layers with end devices anached to it, communicating with each other 
using peer-io-peer protocols. The well-known Open Systems Interconnection (OSI) 
Reference Model provides a generalized way to view a network using seven layers and 
is a convenient reference for mapping the functionality of other models and actual 
implementations. The distinctions between the layers in any given model is clear, but 

15 the implementation of any given model or mapping of layers between different. models 
is not. For example, the standard promulgated by the Institute of Electrical and 
Electronics Engineers (IEEE) in its 802 protocols defines standards for LANs and its 
definitions overlap the bottom two layers of the OSI model. 

In any such model, a given layer communicates either with the same layer of a 

20 peer end station across the network, or with the same layer of a network element 
within the network itself. A layer implements a set of functions that are usually 
logically related and enable the operation of the layer above it. 

The relevant layers for describing this invention include OSI Layers I through 
4. Layer 1, the physical layer, provides functions to send and receive unstrucmred bit 

25 patterns over a physical link. The physical layer concerns itself with such issues as the 
size and shape of connectors, conversion of bits to electrical signals, and bit-level 
synchronization. More than one type of physical layer may exist within a network. 
Two common types of Layer 1 are found within IEEE Standard 802.3 and FDDI (Fiber 
Distributed Data Interface). 



2 
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Layer 2, the data link layer, provides support for framing, error detecting, 
accessing the rranspon media, and addressing berween end stations interconnected at or 
below layer 2. The data link layer is typically designed to carry packets of information 
across a single hop, i.e., from one end station to another within the same subnet, or 
5 LAN. 

Layer 3, the ner^vork layer, provides support for such functions as end to end 
addressing, network topological information, routing, and packet fragmentation. This 
laver may be configured to send packets along the best "route" from its source to its 
final destination. An additional feature of this layer is the capability to relay 
10 information about nen^ork congestion to the source or destination if conditions 
warrant. 

Layer 4, the transport layer, provides application programs such as an electronic 
mail program with a "pon address" which the application can use to interface with the 
data link layer. A key difference between the transport layer and the lower layers is 

15 that an application on a source end station can carry out a conversation with a similar 
application on a destination end station anywhere in the network; whereas the lower 
layers carry on conversations with end stations which are its immediate neighbors in 
the network. Layer 4 protocols also support reliable connection oriented services, an 
example Layer 4 protocol providing such services is the Transport Control Protocol 

20 (TCP). 

Different building blocks exist for implementing networks that operate at these 
layers. End stations are the end points of a network and can function as sources, 
destinations and network elements or any other intermediate point for forwarding data 
received from a source to a destination. 

25 At the simplest level are repeaters which are physical layer relays which simply 

forward bits at Layer I. 

Bridges represent the next level above repeaters and are data link layer entities 
which forward packets within a single LAN using look-up tables. They do not modify 
packets, but just forward packets based on a destination. Most bridges are learning 

30 bridges. In these bridges, if the bridge has previously learned a source, it already 
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knows to which port to forward the packet. If the bridge has not yet forwarded a 
packet from the destination, the bridge does not know the port location of the 
destination, and forwards the packet to all unblocked output pons, excluding the port 
of arrival. Other than acquiring a knowledge of which pons sources are cransmiiiing 
5 packets to, the bridge has no knowledge of the network topology. Many LANs can be 
implemented using bridges only. 

Routers are network layer entities which can forward packets between LANs. 
They have the potential to use the best path that exists between sources and 
destinations based on information exchanged with other routers that allow the routers 

10 to have knowledge of the topology of the network. Factors contributing to the "best" 
path might include cost, speed, traffic, and bandwidth, as well as others. 

Brouters are routers which can also perform as bridges. For those layer 3 
protocols of which the brouter knows, it uses its software to determine how to t^prvvard 
the packet. For all other packets, .the brouier acts as a bridge. 

15 Switches are generalized network elements for forwarding packets wherein the 

composition of the switch and whether it implements layer 2 or layer 3 is not relevant. 

Typically, bridges forward packets in a flat network without any cooperation by 
the end stations, because the LAN contains no topological hierarchy. If a LAN, for 
example, is designed to support layer 3 functionality, then routers are used to 

20 interconnect and forward packets within the LAN. 

Bridges cannot use hierarchical routing addresses because they base their 
forwarding decisions on media access control (MAC) addresses which contain no 
topological significance. T>pically MAC addresses are assigned to a device at its time 
of manufacture. The number of stations that can be interconnected through bridges is 

25 limited because traffic isolation, bandwidth, fault delecting, and management aspects 
become too difficult or burdensome as the number of end stations increases. 

Learning bridges self-configure, allowing them to be "plug and play*' entities 
requiring virtually no human interaction for setup. Routers, however, require intensive 
configuration, and may even require configuration activities at the end nodes. For 

30 example, when a network utilizes the Transmission Control Protocol/Internet Protocol 
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(TCP/IP), each end node must manually receive its address and subnet mask from an 
operator, and such informaiion must be input to the router. 

Generally, as the size and complexity of a network increases, the network 
requires more funciionalit>' at the higher layers. For example, a relatively small LAN 
5 can be implemented by using Layer I elements such as repeaters or bridges, while a 
very large network uses up to and including Layer 3 elements such as routers. 

A single LAN is typically insufficient to meet the requirements of an 
organization because of the inherent limitations: (1) on the number of end stations that 
can be attached to a physical layer segment; (2) the physical layer segment size; and 
10 (3) the amount of traffic, which is limited because the bandwidth of the segment must 
be shared among all the connected end stations. In order to overcome these constraints, 
other network building blocks are required. 

As briefly described above, when the number of end stations in a network 
increases, the network may be partitioned into subnetworks/*A typical address in a 
1 5 partitioned network includes two pans: a first pan indicating the subnetwork: and a 
second part indicating an address within the subnetwork. These types of addresses 
convey topological information because the first pan of the address defines 
ceocraphical or logical ponions of the network and the second pan defines an end 
station within the subnet^vork ponion. Routing with hierarchial addressing involves two 
20 steps: first packets are routed to the destination's subnetwork; and second packets arc 
forwarded to the destination within the subnetwork. 

An end station receives a unique data link address the WAC address at the 
time of manut^acture, allowing the end station to anach to any LAN within a bridged 
network without worrying about duplicate addresses. Data link addresses therefore 
25 cannot convey any topological information. Bridges, unlike routers, forward packets 
based on data link addresses and thus cannot interpret hierarchical addresses. 

The current internet is being forced to deal with increasing numbers of users 
and increasing demands of multimedia applications. Future networks will be required 
to support even higher bandwidth, larger numbers of users, and traffic classification 
30 requirements by the network. Statistical studies show that the network domain as well 
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as the number of workstations connected to the network will grow at a faster rate in 
future. The trend is also to support multiple traffic types with varied characteristics on 
a same physical link. This calls for more nerwork bandwidth and efHcieni usage of 
resources. To meet the bandwidth requirement, the speed on the networks is on the 
5 upward trend, reaching to gigabit speeds. 

Network designers frequently use one particular combination of ISO Layer 2 
and Layer 3 because of the success of the Internet and the increasing number of 
products and ner^vorks using the Internet. Specifically, in a typical Internet-associated 
network, designers combine an implementation in accordance with the IEEE 802 
10 Standard (which overlaps ISO Layer I and Layer 2) with the Internet Protocol (IP) 
network layer. This combination is also becoming popular within enterprise networks 
such as intranets. 

Supporting this combination by building networks out of layer 2 nerwork 
elements provides fast packet forwarding but has little flexib'ility in terms of traffic 

15 isolation, redundant topologies, and end-to-end policies for queuing and administration 
(access control). Building such networks out of layer 3 elements alone sacrifices 
performance and is impractical from the hierarchical point of view because of the 
overhead associated with having to parse the layer 3 header and modify- the packet if 
necessary. Furthermore, using solely layer 3 elements forces an addressing model with 

20 one end station per subnet, and no layer 2 connectivity between the end stations. 

Networks built out of a combination of layer 2 and layer 3 devices are used 
today, but suffer from performance and fiexibility shortcomings. Specifically, with 
increasing variation in traffic distribution (the role of the "ser\'er" has multiplied with 
browser-based applications), the need to traverse routers at high speed is crucial. 

25 The choice between bridges and routers typically results in significant tradeoffs 

(in functionality when using bridges, and in speed when using routers). Funhermore, 
the service characteristics, such as priority, within a network are generally no longer 
homogeneous, despite whether traffic patterns involve routers. In these networks, 
differing traffic types exists and require different service characteristics such as 

30 bandwidth, delay, and etc. 
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To meet the traffic requirements of applications, the bridging devices should 
operate at line speeds, i.e., they operate at or faster than the speed at which packets 
arrive at the device, but they also must be able to forv^'ard packets across 
domains/subnetworks. Even through current hybrid bridge/router designs are able lo 
5 achieve correct network delivery functions, they are not able to meet today's 
increasing speed requirements. 

What is needed is a switch or network element that forwards both layer 2 and 
layer 3 packets quickly and efficiently both within a subnetwork and to other networks. 
Further, a network element is needed that can forward layer 3 packets at wire-speed, 
10 i.e., as fast as packets enter the network element. Additionally, a network element is 
needed that allows layer 2 forwarding within a subnetwork to have the additional 
features available in layer 3 routing and to provide cenain quality of service for 
applications within the subnen^'ork, such as priority and bandwidth reservation. 

15 SUMMARY OF THE INVENTION 

The present invention enables the above problems to be substantially overcome 
by providing a system and method for an multi-layer network element for for\varding 
received packets to one or more appropriate output ports. 

The apparatus according to one embodiment that detects and handles congestion 

20 in an output port of a multi-layer network element comprises a central processor unit 
(CPU) and a switching element. The switching element is configured to output packets 
to a network through output pons. The switching element includes at least one 
variable-length output queue that queues packets for output, having storage locations 
for packet pointers. Each queue has associated with it a start register that stores a 

25 pointer to the storage location at the front of the queue and an end register that stores a 
pointer to the storage location at the end of the queue as determined by the number of 
storage location. The queues also have associated with each of them a next-free register 
that stores a pointer to the next available storage location, wherein packet pointers are 
stored in the output queue beginning at the location pointed to by the Stan register and 

30 the next-free register is incremented as the next available storage location moves 
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toward the second pointer. The output queues also have associated with each; a 
pro^raromable threshold register that stores a threshold pointer to a storage location 
between the location represented by the start register and the location represented by 
the end register. 

5 Threshold logic outputs a congestion signal when the value in the next free 

reaister represents a storage location logically located between the location pointed to 
key the threshold register and including the storage location pointed to by the end 
register. 

In response to the congestion signal, random discarding logic randomly selects 
0 packets to discard, so that once the threshold is exceeded, incoming packets are 
randomly discarded, using a packet discarding algorithm, such as Random Early 
Discard (RED). Capacity logic outputs a queue full signal to the CPU when queue 
becomes full. 

The switching element also includes a memory having at least one entry that 
5 stores information about forwarding decisions for the packet, wherein the entry is 
adapted to indicate whether packets associated with that entry should be counted. 
Memory access logic accesses the entry when an incoming packet associated with that 
entry arrives at the switching element. A packet counter counts the number of times the 
entry is accessed, to represent an entry bandwidth and a computer program mechanism 
0 coupled to the CPU compares the contents of the packet counter to a reser\'ation-based 
protocol negotiated value and lowers a priority of any future packet associated with the 
entry and destined for the output queue. 

According to another embodiment of the invention an apparatus for handling 
multiple priorities for a multicast packet being output from a network element on at 
5 least two output pons includes at least two output queues having different priorities 
and a memory configured to output forwarding information about the multicast packet 
in response to a memory access based in part on a multicast address of the multicast 
packet, the forwarding information including priority information indicating to which 
output queue at each output port the multicast packet will be directed. 
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A central processing unit coupled to the memory utilizes a computer program 
mechanism coupled lo the central processing unit to modify the priority infonmation 
based on an amount of packets being transmitted through one of the output pons. 

A central processing unit coupled to the memory may also utilize a computer 
5 proeram mechanism coupled to the central processing unit configured to modify the 
priority information based on information communicated between the network element 
and an intended recipient of the multicast packet. 

Accordine to still another embodiment of the invention an apparatus for queue 
scheduling in a network element includes at least one output port configured to output 
10 packets, each packet having a byte length. At least two queues associated with each 
output port, configured to queue packets to be output at each output port are also 
provided. 

A weicht register is associated with each queue and contains a weight number 
generated based on weighting criteria. Transmining logic at each output pon transmits 

15 packets identified in each queue according to a queue select signal and responsive to a 
done sienai. Scheduling logic at each output port selects one of the queues and 
generates the queue select signal to the transmitting logic to indicate which queue will 
be transmitting. Counter logic, at each output pon, decrement the weight register equal 
to a number of bytes transmitted by the transmitting logic and zero logic configured to 

20 transmit the done signal when the number in the counter represents zero. Reloading 

locic determines the number of packets transmitted after the done signal and places in 
the vveisht register a value equal to the weight number minus the number of packets 
transmitted after the done signal. 

Still other embodiments of the present invention will become readily apparent 

25 to those skilled in the art from the following detailed description, wherein is shown and 
described only the embodiments of the invention by way of illustration of the best 
modes contemplated for carrying out the invention. As will be realized, the invention 
is capable of other and different embodiments and several of its details are capable of 
modification in various obvious respects, all without departing the spirit and scope of 
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the present invention. Accordingly, the drawings and detailed description are to be 
regarded as illustrative in nature and not as restrictive. 

BRIEF DESCRIPTION OF THE DRAWINGS 
5 Fig. 1 illustrates a system incorporating a multi-layer network element 

according to the invention. 

Fig. 2 illustrates the multi-layer networking element of Fig. I. 
Fig. 3 illustrates the switching element of the multi-layer network element in 
more detail. 

10 Fig. 4 illustrates the forwarding logic of the switching element in more detail. 

Fig. 5 illustrates the class logic of Fig. 4 in more detail. 

Fig. 6 illustrates the process used in determining which inforrhation dictates a 
packet's path through the multi-layer network element. 

Fia. 7 illustrates the information dependency in determining how to forward a 
15 packet out of the network element. 

Fig. 8 illustrates an output pon in more detail. 

DETAILED DESCRIPTION 
Fig. I illustrates a system incorporating a multi-layer network element 

20 according to the present invention. The system includes the multi-layer network 

element, various networks, end stations, routers, and bridges. By way of example and 
as broadly embodied and described herein, a system 10 incorporatmg a multi-layer 
nerw'ork element 12 according to the present invention includes networks 14 and 16, 
end stations 18, router 24, bridge 26, and local area networks (LAN) 28. 

25 The bridge 26 connects some of the LANs 28 and end stations 18 to the 

network 14 and to each other. The bridge 26 may be a conventional learning bridge. 
The bridge 26 keeps track of the addresses of the end stations 18 that transmit a packet 
showing up on one of ports 30 to the bridge 26. The end stations 18 may be any 
device capable of sending or receiving packets of information. Typically, the end 
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Stations 18 are personal computers, workstations, printers, servers, and/or any other 
device that can be connected to a network. 

The bridge 26 initially does not know on which of its pons packet destinations 
are located, and must flood an incoming packet to all pons in order to properly 
5 fon^vard the packet. Once the bridge 26 receives a packet destined for an address it 
already recognizes, the bridge 26 knows what pon the destination is on so that it does 
not have to flood the packet on all outgoing pons. Evenmally, the bridge 26 has 
learned enough addresses to all but eliminate the amount of flooding needed on the 
pons. Of course, any lime an end station IS changes pons on the bridge 26. the bridge 
10 26 must releam the end station I8's port. 

The bridge 26 typically does not modify the packet, contains no information 
about the topology of the network 14, and examines few pans of the packet header. 
The bridge 26 operates quickly because it makes no modifications to the packet and is 
onlv concerned with learning sources and forwarding to destinations. Typically, bridges 
15 26 use look-up tables to search for sources and destinations. 

The router, 24 connects the network 14 to the networks 16. Only one router 24 
is illustrated by way of example, but there may be many routers connecting other 
networks or end stations 18. The router 24 provides the communication necessary 
between the network 14 and the networks 16 and may a conventional router. Such 
20 routers include layer 3 functionality for forwarding packets to an appropriate 

destination including route calculation, packet fragmentation, and congestion control. 
Routers of this type are described, for example, in Interconnections: Bridges and 
Routers by Radia Perlman published by Addison-Wesley. The router 24 must have 
knowledge of the topology of the network in order to determine the best route for 
25 packets. The router 24*s knowledge of the network is gained through topological 
information passed between multiple such routers 24 connected to the network 14. 

Software running on the router 24 parses an incoming packet to determine 
various characteristics about the packet, including the type of the protocol being used 
and the source and destination(s). Other determinations based on examining the packet 
30 may be necessary, such as priority and quality of service (QoS) factors such as priority 
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and bandwidth reservation. The router 24 then uses the extracted information and 
computes the next destination for the packet based on topology and route information 
that is stored in the memory of the router 24. The router 24 also applies QoS rules and 
actions. 

5 The router 24's process for calculating the next destination may require many 

accesses to memory and computation of the route from that information. Furthermore, 
the packet is typically received and stored while any processing is taking place. After 
the router 24 has determined what actions are necessary on the packet, any 
modifications are made to the packet as stored in the memory or on the way out of the 
10 router 24. The routers 24 are typically required to replace the layer 2 source and 

destination of the packet for unicast packets, update any checksums of the packet, and 
handle any issues related to packet lifetime. 

To carry out the functions that the conventional router 24 performs, the 
software examines memory locations, make modifications to the packet, and calculate 
15 new values for some fields. Such actions provide increased functionality beyond simple 
packet forwarding like that found in bridges 26 such as determining the best route for 
the packet, providing QoS features; however, in conventional routers 24 such actions 
take up valuable time. 

The network 14 provides communication paths for all of the elements 
20 connected to it. In the example of Fig. 1, the elements include the multi-layer network 
element 12, router 24, and bridge 26. Any number of elements could be connected to 
the network 14 in a multitude of ways. Fig. 1 illustrates only one possible 
combmaiion. The elements connected to the network 14 do not require the network 14 
to be of any particular size or configuration. For the end stations 18 and the bridge 26, 
25 a detailed topological knowledge of the network 14 is not required. 

The multi-layer network element 12 according to the present invention connects 
various elements to the network 14 and to each other. As illustrated by way of 
example, the multi-layer network element 12 connects a LAN 28, the end stations 18, 
and the network 14. The multi-layer network element 12 combines the functions of 
30 both a bridge and a router. Functioning as a router, the multi-layer network element 12 

12 
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contains topological information about network 14 to inielligenily route a packet to its 
destination while providing associated layer 3 functionality typically found in a router 
24. Functioning as a bridge, the multi-layer network element 12 learns source/port 
combinations to forward layer 2 packets. The multi-layer network element 12 differs 
5 from conventional bridge/router combinations in that certain layer 3 processing 
operates as quickly as layer 2 switching found in the bridge 26. 

Fig. 2 illustrates the multi-layer network element 12 of Fig. I in more detail. 
The multi-layer nerv^•ork element 12 according to one embodiment of the invention 
includes a processor 32, a processor memory 34, a switching element 36. a plurality of 

10 network element pons 38, a forwarding memory 40, an associated memory 42, and 
packet buffer memory 44. The end stations 18, the LAN 2S, and the network 14 are 
connected to the multi-layer network element 12 using a plurality of network element 
ports 38, Other multi-layer network elements 12 may also be connected to the multi- 
layer network element 12. 

15 The switching element 36 is also connected to the processor 32, the forwarding 

memory 40, the associated memory 42, and the packet buffer memory 44. The 
processor 32 is also connected to the memory 34. Forwarding memory 40 and 
associated memory 42 is connected to each other as well to as switching element 36. 
The switching element 36 performs most of the packet forwarding functions 

20 usins both layer 2 and layer 3 information, and possibly also some layer 4 information, 
stored in fonvarding memory 40 and associated memory 42, without having to rely on 
the processor 32 to calculate routes or determine appropriate actions on ever>- packet. 

The processor 32 performs tasks that the switching element 36 is not equipped 
to handle. For example, when new layer 3 routes must be calculated, the processor 32 

25 uses processor memory 34, which contains detailed information about the topology of 
any networks reachable from the multi-layer network element 12. The processor 32 
makes its computations primarily using software programming units in conjunction 
with accesses to the memory 34. The switching element 36 makes its decisions 
primarily in hardware, using the forwarding memory 40 and the associated memory 42. 

30 The forwarding memory 40 and the associated memory 42 contain only a portion of 

13 
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the informaiion contained in the memory 34, and are configured for quick access and 
retrieval. 

Fig. 3 illustrates a detailed view of the switching element 36 and its 
connections to the processor 32, the plurality of network element pons 38a-n, the 

5 forwarding memory 40, the associated memory 42, and the packet buffer memory 44. 
The switch element 36 includes input ports 50a-n, a forwarding logic 52, a packet 
memory manager 54, and output ports 56a-n. Each input pon 50i and output pon i 
corresponds to a network element port 381. Each of the inputs pons 50 also connects to 
both the foRvarding logic 52 and the packet memory manager 54. 

10 For a given i, an input port 50i receives packets from its respective multi-layer 

network element port 38i and tests the packets for correctness. If the packet is ill 
formed, it is discarded. Packets passing this initial screening are temporarily buffered 
by the input pon 50i. Once the input port 50i has buffered at least the first 64 bytes of 
the received packet, the input port 50i passes the header to tRe forwarding logic 52. 

15 The forwarding logic 52 is connected to the processor 32, the forwarding 

memory 40, and the associated memory 42. The forwarding logic 52 performs several 
functions, li initially screens the packet to determine whether the packet is 
encapsulated, by for example Subnerwork Access Protocol (SNAP), or whether the 
packet is tagged, for example, by a virtual LAN (VLAN) identifier. If the packet is 

20 either of those two types, the forwarding logic 52 uses offset information to locate 
appropriate layer header information needed for further processing. 

The fon^'arding logic 52 also searches the forwarding memory 40 for matches 
at layer 2 and/'or layer 3. The search may also include some information at layer 4. In 
the preferred embodiment, the forvv'arding memory 40 is a content-addressable memory 

25 (CAM) storing information about both layer 2 and layer 3 switching, and may contain 
some layer 4 information. If a match is found, data stored in associated memory 42 
and pointed to by the matching entry in the forwarding memory 40 serves to define the 
actions that the switching element 36 must do to forward the packet to the appropriate 
destination(s). 
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In another embodiment, the forwarding memory 40 could be implemented using 
an sequentially address random access memor>'. In this embodiment, a hashing function 
would be prefomied on the particular key. The resulting hashed value would be an 
address into the memory 42 associated with the pre-hashed key. 
5 In still another embodiment, the forwarding memory 40 and the associated 

rnemory 42 could be contained in a single random access memory. In one 
implementation of that single random access memory, the entries could be accessed 
sequentially, requiring a hash-front end. Another implementation of that single random 
access memory could be a CAM. 
10 The packet memory manager 54 is connected to the packet buffer memory 44. 

the input port 50i, and the output port 56i. As indicated above, each output port 56i 
corresponds to one of the plurality of multi-layer nerwork element ports 38i- While 
illustrated as separate units, the input port 50i and output port 56i corresponding to a 
particular multi-layer network element port 38i are tightly coupled since information 
15 flows both ways through the network element ports 38. 

After the forwarding logic 52 has determined what to do with the packet, it 
passes that information to the input port 50i. If the input port 50i does not Hlter the 
packet, then it requests pointer to free memory locations in the packet buffer memor\' 
44 from the packet memory manager 54. The packet memor>- manager 54 responds by 
20 providins location addresses of free memory space in the packet buffer memorV 44. 
The input port 50i then requests a write access from the packet memory manager 54 
and sends the pointer and the data lo the packet memory- manager 54. 

In some instances, the input port 50i must make modifications to the packet as 
instructed to do so from the fopA'arding logic 52. The input port 50i makes these 
25 modifications prior to the packet being stored in the packet buffer memory 44, When 
requested by the input port 50i, the packet memory manager 54 places the packet into 
the appropriate address location specified by the input port 50i. The input port 50i then 
passes information about where the packet is stored to the appropriate output ports 56 
as determined from the information received at the input port 50i from the forwarding 
30 logic 52. 

15 



wo 99/00949 



PCT/US98/13364 



In a preferred embodiment, the appropriate output pons may include no output 
ports or one or more output pons. The output pon 56i requests and receives packets 
from the packet manager 54, and transmits the packet to its associated network element 
pon 38i when the conditions for transmission are met. In some instances, the output 
5 port 56i must place its MAC address as the source address on outgoing packets. If this 
situation is dictated by the results from the forwarding logic 52 as passed to the input 
pon 50i, the input pon 50i places such an indication in the packet buffer memory 44. 
The output port 56i detects this indication and replaces the address as the packet leaves 
the output pon 56i- Thus, only minor modifications to the packets are necessary on the 
10 output side of the switching element 36. 

According to the above embodiment, when the forwarding memory 40 contains 
matching entries for layer 2 switching or layer 3 routing, ih? multi-layer network 
element 12 will operate at wire-speed. Wire-speed is defined by the speed at the 
maximum packet rate at which a given layer 1 and layer 2 combination can transport 
15 packets. If an element connected to a network can process packets as fast as they enter 
the element or faster, then the element operates at wire speed. 

In a preferred embodiment, the network element 12 processes packets for a 
worst-case scenario of a steady stream of 64-byie packets entering all input pons 50 
simultaneously. If the layer 3 information is not contained in the forwarding memory 
20 40, the packet is for^varded using layer 2 information and then processed according to 
conventional layer 3 processing by software in the processor 32. 

Unlike conventional layer 3 processing, the processor 32 may update the 
forwardina memory 40 by placing new layer 3 entries as they are learned and created. 
Any packets matching the new entries are forwarded at wire-speed, i.e. forwarding 
25 decisions are made for a packet before the next packet arrives. 

While the discussion of this invention is described using layer 2 and a 
combination of layers 3 and 4, one skilled in the an would recognize that searching on 
and creating entries in the forwarding memory 40 for any ponion of a packet or its 
header, or any combination thereof, readily flows from the description. Thus, this 
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invention is not limiied to any specific implementation of layers according to the ISO 
standard. 

Fig. 4 illustrates the forwarding logic 52 in more detail. The forwarding logic 
52 includes class logic 60, layer 2 (L2) logic 62. layer 3 (L3) logic 64. and merge 
5 logic 66. The input port 50i connects to the class logic 60, the L2 logic 62, the L3 
logic 64, and the merge logic 66. Only one input port 50i is shown for simplincaiion, 
though all input pons 50 are connected in a similar manner. Preferably, the forwarding 
loeic 52 is not duplicated for each input port 50i. Instead, all input pons 50 arbitrate 
for access to the forwarding logic 52. 

10 The L2 logic 62 is connected to the forwarding memory 40 and is responsible 

for creating a key to match against the entries stored in the forwarding memory 40 for 
layer 2 forwarding decisions. Depending on the configuration^of the forwarding 
memorv- 40, the key may be applied against all or some of the entries of the 
forwarding memory 40 

15 During operation, the input pon 50i receives a packet from the multi-layer 

network element port 38i and sends the header plus the input pon 50i identifier to the 
fonvarding logic 52. The forwarding logic 52 first searches the forwarding memory 40 
to determine whether the fonvarding memory 40 contains an entry for the layer 2 
source transmitting the packet. A matching entry will exist if the multi-layer network 

20 element 12 has previously received a packet from the same layer 2 source and has 

learned which pon it is connected to. If no matching entry exists, the forwarding logic 
52 performs a learn function by placing an entry in the forwarding memory 40 
including the source address. The forwarding logic 52 signals the processor 32 that it 
has learned a new source address. In some instances, the layer 2 source will exist in 

25 the for\varding memory 40, but will be associated with a different input pon 50i than 
the input port 50i of the incoming packet. In this instance, no matching entry will exist 
in the forwarding memory 40 because a match depends on both the layer 2 source and 
the input port 50i. 

The forwarding logic 52 also searches the forwarding memory 40 for an entry 
30 indicating the pon of the destination address. If no match is found, then the fonvarding 
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|oeic 52 instructs the input port 50i to flood the packet to all of the active output pons 
56. 

For the layer 2 information described above in the preferred embodirrient. the 
forwarding memory 40 contains the values of the MAC addresses of the sources and a 
5 pointer to a corresponding entry in the associated memory 42. The forwarding memor>' 
40 may also contain additional layer 2 information such as a VLAN identifier if lagged 
packets are being used. The associated rnemory 42 contains more information about its 
corresponding. entry in the forwarding memory 40. Layer 2 information in the 
for^varding memory 40 is preferably limited to the least amount of information 

10 necessary to make a layer 2 search. In a layer 2 search, this information is preferably 
just the MAC address and the input port 50i, but the CAM may also contain any 
information relating to tagged addressing. 

In a preferred embodiment, the forwarding memory 40 allows multiple matches 
for a layer 2 search. The processor 32 ensures that the order of the entries is such that 

15 if an address/pon combination exists in the forwarding memory, that entry is selected. 
If the particular source/port combination is not found, then a match may occur 
including VLAN information so that any layer 2 destination search will at least match 
a known VLAN or an unknown VLAN entry, each of which define the output ports 56 
for flooding in its respective entry. 

20 The L3 logic 64 is connected to the forwarding memory 40 and is responsible 

for creating a key to match against the entries stored in the forwarding memory 40 for 
layer 3 forwarding decisions. As with the L2 search key, the L3 key may be applied 
against all or some of the entries of the forwarding memory 40. 

To create the key, the L3 logic 64 uses information from the input port 50i 

25 including the packet header and an input port 50i identifier, and information from the 
class logic 60. The merge logic 66 is connected to the class logic 60, the associated 
memory 42, the packet memory manager 54, and the processor 32. The merge logic 
66 uses information from the class logic 60 and information output from the associated 
memory 42 to instruct the input port 501 what to do to properly forward the packet to 

30 its appropriate destination(s). In some instances, there is no appropriate destination and 
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the packet is discarded. In other instances, the merge logic 66 will signal the processor 
32 that it must perform some task in response to the received packet. 

Layer 3 switching, while more complex, is similar to layer 2 switching. The 
for^varding logic 52 searches the fonvarding memor>* 40 for a matching enir> to a 

5 laver 3 search key created by the L3 logic 64. If a match exists, the information in the 
associated memor>' 42 is used by the merge logic 66 to instruct the input port 50i what 
to do with the packet. If the search provides no match, the switching element 36 
forwards the packet as a bridge and may pass all or portions of the packet to the 
processor 32 for funher processing. The L3 logic 64 creates the search key using 

10 information from the packet header, the input port 50i, and the class logic 60. 

The class logic 60 examines information in the packet header to determine any 
encapsulation information and to determine a class for the layer 3 information and is 
illustrated in more detail in Fig. 5. The class logic 60 includes the encapsulation logic 
6S and the class action logic 70. Each input port 50i is connected to both the 

15 encapsulation logic 68 and the class action logic 70. The class action logic 70 is 
connected to the encapsulation logic 68, the L3 logic 64, and the merge logic 66. 

The encapsulation logic 68 is responsible for examining the packet header and 
determininc any offsets into the header for the layer 3 and layer 4 information, if 
needed. The encapsulation logic 68 includes class filters 72 to detennine any offsets 

20 into the packet to identify locations of relevant information. In a preferred embodiment 
one niter 72 recognizes an implementation in accordance with the IEEE 802.3 
Standard Ethemei header, and another filter 72 recognizes an implementation in 
accordance with the IEEE Standard 802. Iq Tagged Ethernet Header, and still another 
recognizes an LCC SNAP encapsulation. Other encapsulations would become readily 

25 apparent to one skilled in the an and could be implemented with additional 

encapsulation filters 72. The encapsulation logic 68 passes encapsulation offsets to the 
class action logic 70 so that the class action logic 70 knows from where in the packet 
to draw the appropriate field information. 

The class action logic 70 determines to which class a packet belongs. A class is 

30 used by both the L2 and L3 logics to aid in searching and to add to the functionality 
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of the multi-layer network element 12. The L2 logic 62 applies a single class to all 
laver 2 searches. Layer 3. on the other hand, has a plurality of programmable classes. 

The classes help to define a class type and for each class which bytes from the 
packet header that should be used in creating the layer 3 search key by the L3 logic 
5 64, its priority, and a default class result that defines what should happen if no layer 3 
match occurs in the forv/arding memory 40. 

In a preferred embodiment, there are four possible outcomes when no match 
occurs. First, the header may be sent to the processor 32. This is contemplated when 
the possibility of identifying a layer 3 fiow exists. Second, the entire packet could be 
10 copied to the processor 32. This is contemplated when initially setting a unicasi route 
or to provide firewall protection by initially examining certain routes or flows or when 
it is unknown where in the packet required information may^xist to create search 
keys. Third, use layer 2 results for forwarding. Founh, discard the packet. Other action 
may be possible depending on the configuration of the network or the particular 
15 protocol in use as would become readily apparent to one skilled in the art. 

Some of the criteria that the classes take into account may be whether the class 
is considered address dependent or address-independent. Adding a class identifier 
allows the switching element 36 to respond to varying network situations and greatly 
simplifies organizing and storing information in the forwarding memory 40. 
20 Representative examples of address independent classes that could be identified 

by the class logic 60 include: Address Resolution Protocol (ARP); Internet Group 
Management Protocol (IGMP); Reverse ARP (R.ARP): Group Address Registration 
Protocol (GARP); Protocol Independent Protocol (PIM); and Reservation Protocol 
(RSVP). Representative examples of address dependent classes include: TCP fiow; non 
25 fragmented UDP fiow; fragmented UDP fiow; hardware routable IP; and IP version 6. 
Of course, other protocols could be similarly recognized. 

The class logic 60 produces an unambiguous class result for every incoming 
packet. For an unrecognized protocol, the class logic 60 will still produce a class 
result, but that class result signifies an unrecognized protocol and what actions should 
30 take place on a packet of this type of class. 

20 
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Generally, layer 3 flows are address dependent and will contain information 
bcvond just a simple class of traffic. In those instances where additional information 
has been placed by the processor 32 into the forwarding memory 40, there may be 
more than one entry for a panicular class in the forwarding memory 40. The processor 
5 32 ensures that of the entries matched, the one used is the most appropriate one. 
Different classes may have different criteria for what is the most appropriate match 
depending on the type of packets embodied within a particular class. The flexibility 
allowed by having multiple matching entries in the forwarding memory 40 is further 
enhanced by ensuring that the best match is provided for a particular flow and because 
10 of this feature, different actions will be possible for packets within the same type of 
class. 

In the preferred embodiment, the processor 32 reorders the l^yer 3 entries when 
it places any new layer 3 so that the best match for a panicular search criteria occurs 
earliest in the memory. Those skilled in the art will recognize many different 

15 implementations to achieve the same result. In one preferred embodiment, the 

processor 32 ensures that the entry with the longest potential matching key within a 
panicular class is at the top, or earliest, location in the memory. However, the 
processor 32 may also place an entrv' above the longest matching entry so that for a 
panicular traffic pattern the most imponani match may be one that matches many 

20 kevs. For example, an entry that matches, for a panicular class, based on an 

application pon such as "http" and no other intormation, will take precedence over 
entries that might match more than just the layer 4 application. Another example might 
be rorcing a match on a panicular source within a class type. This might occur when 
the operator might want to provide packets coming from a panicular server with a high 

25 priority regardless of the destination or layer 4 application. 

In a preferred embodiment, the merge logic 66 directs the input port 50i to lake 
one of the following actions on a packet: filter the packet; forward the packet at layer 
2; forward the packet as a layer 3 fiow; process the packet as a layer 3 route; and 
forward the packet as a multicast route. Packets that the merge logic 66 instructs the 

30 input port 50i to filter are those that include certain header information determined to 
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be unsupported. Examples of classes whose packets would be forwarded ai layer 2 
would include a fragmented UDP flow and a class indicating thai the header 
informaiion is unkjiown. A fragmented UDP operates using layer 2 information 
because after the first packet, the fragmented packets do not include all relevant 
5 information from the layer 4 header information, UDP ports for example. Layer 2 
forwarding would be optional for address independent classes depending on the 
panicular class. 

The merge logic 66 instructs the input pon 50i to use layer 3 flow information 
for TCP or non-fragmented UDP flows. Flows are those packets fonvarded within the 

10 subnet to which the multi-layer network element 12 is attached and require no header 
modification on forwarding. Routes, on the other hand, are packets coming from 
sources outside the subnet or destined to addresses beyond tF\e subnet such that the 
header information must be modified prior to forwarding by the multi-layer network 
element 12. In a preferred embodiment, instructions to forward the packet as a layer 3 

15 route come from the merge logic 66 when the class indicates that the packet is of a 
class hardware routable IP. In other words, the destination of the incoming packet is 
recognized by the class logic 60 of the multi-layer network element 12, and the multi- 
layer network element 12, must then forward the packet to the next hop destination, 
which is determined by routing protocols. Those skilled in the art can easily recognize 

20 from the invention other situations where such a type of result would be desired. 

One feature of the invention is the ability to bridge flows, that is, use the 
forwarding memory to quickly forward layer 2 packets using layer 3 functionality 
through the network element 12. Certain flows are particularly suited for this type of 
activity and include static flows, self-detecting flows, and flows set up by reservation 

25 protocols, such as RSVP. Static flows are those set up in advance by the network 

element 12 operator and define layer 3 functionality for selected layer 2 network traffic 
and are not subject to aging. Self-detecting flows are a function of the type of 
application. 

Initially, these flows are bridged with no layer 3 functionality because no layer 
30 3 entry matches. The packet header is sent to the processor 32 for examination. The 
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processor 32 analyzes the packet and based on programmed heuristics determines 
whether and how to create a layer 3 entr\- in the forwarding memory 42 for the packet 
type. For example, a "ping" packet would not warrant a layer 3 flow entry because it 
is, at best, a transient packet. 

5 Protocols like RSVP work to reser\'e cenain ser\'ice features of the network and 

signal that a number of packets will follow this same path, in this case, it serves the 
application using the reservation protocol to forward at layer 2, but add layer 3, or 
more, functionality like priority to ensure the required class of service through the 
multi-layer network element 12. 

10 Fig. 6 illustrates preferred results produced by the merge logic 66 using 

information from the class logic 60 and the associated memory 42. Three results are 
presently preferred: (1) use the layer 2 forwarding results; (2),use the layer 3 
fon^varding results; and (3) use the layer 3 results while using the layer 2 topology. In 
some instances, there may be an identified class, but no matching entry in the 

15 fon-varding memory 40, in this instance, the default actions for the class are used. 

Note that the use of layer 3 default results can be considered a subset of using layer 3 
forwarding results. 

Default results may be set for packets of a class r>pe to provide protection such 
as thai provided by firewall technology. In a firewall application, the multi-layer 
20 network element 12 would be programmed to direct any packet of a defmed class to 
the processor 32 for subsequent processing. 

Referring to Fig. 6, if the class logic 60 determines that the packet is of an 
unrecognized class (step 1 12), then the packet is acted on using the layer 2 results 
(step I 14). If the packet's class is recognized (step 1 12) and the associated memory 42 
25 or class logic 60 indicates that a layer 2 result should be forced (step I 16), then the 
layer 2 results are used (step 1 18) regardless of any other information. 

If no layer 2 results are forced as a result of the layer 2 search (step 1 16) and 
there is a match of the layer 3 key (step 120), then the layer 3 information is checked 
to determine whether the layer 3 information forces a layer 2 port decision (step 122). 
30 If the layer 3 information forces a layer 2 forwarding result, then the output port is 
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determined by the results of the layer 2 search, however, any other information found 
in the results of the layer 3 search are applied fstep 124) such as QoS factors. If the 
layer 3 results do not call for forcing a layer 2 forwarding result, then the layer 3 
results arc passed. on to the input port 50i (step 126). If there is no layer 3 match in 
5 step 120, then the default actions for the class generated by the class logic 66 are 

passed to the input port 50i (step 128). It is also contemplated that a packet is sent to 
the processor 32 without being forwarded to any output port 56 by the input port 50i 
when using L3 class default action. 

Thus, if the class is recognized and the layer 3 search matches an entr\', then 
10 the actions defined by the layer 3 search govern the instructions to the input pon 50i, 
even though that might mean that the layer 2 output pon results are used. If not. the 
packet is treated using layer 2 results and the packet or the packet's header might be 
sent to the processor 32 for subsequent processing of the layer 3 information, if 
desired. 

15 If the information coming out of associated memory 42 for a layer 3 match 

indicates a force layer 2 result, then packet forwarding is done using the layer 2 
results, but any information relating to quality of service may still be implemented on 
a layer 2 forwarding decision. In this w'ay, the multi-layer network element 12 may 
add additional functionality above and beyond normal layer 2 bridges by allowing 

20 quality of serv ice factors to be applied to layer 2 bridging or routing within the same 
subnet or VLAN. 

Accordingly, the input port 50i presents to the forwarding logic 52 the header 
of the received packet and its port designation. The output of the forwarding logic 52 
is a function of the header information and the arrival port and indicates whether the 

25 input port 50i should store the packet in the packet buffer memory 44 in cooperation 
with the packet memory manager 54; whether any priorities should be associated with 
the packet on a panicular output port 56i; and whether the input port 50i should make 
any modifications to the packet such as header replacement prior to passing the packet 
to the packet buffer memory 44. Thus, an output port 56i need not make any 

30 modifications to the header except for insening its MAC address and computing a new 

24 
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packet checksum when routing unicast or multicast packets, for example. 

The laver 2 and layer 3 information in the forwarding mcmor\' 40 are 
independent of each other as applied to searches although some information contained 
in a layer 2 entry may be duplicated in a layer 3 entr>'. Additionally, a layer 3 entry 
mav also contain some layer 4 information such as the UDP or TCP ports. Those 
skilled in the art would readily recognize other feamres that could be added by 
including other information from other header layers or the packet body and such are 
considered to be within the scope of this invention. After both the layer 2 and layer 3 
searches are completed, the merge logic 66 determines what actions the input port 50i 
should do to the packet. 

Any layer 2 learning of source addresses, or changes that might occur as a 
result of a topology change are communicated to the processor 32 as part of the layer 
2 source search. As mentioned earlier, the layer 2 information may include tagged 
information like that used to support virtual LAN (VLAN) information. When and, if 
5 used, the VLAN information helps to restrict layer 2 flooding to only those ports 
associated with a particular VLAN or specific tagging. 

Each entry in the associated memor>' 42 may contain information relating to the 
following outcomes. The entry includes an indication of the output ports 56 for the 
packet including whether all or portions of the packet should be sent to the processor 
0 32. The entry allows for more than one pon 56i to be specified, if needed, to support 
for example multicast addressing, for example. The entry also includes a priorirs* for 
the packet which maps into the. number of output queues which may be present on an 
output port 56. The entr>' also includes an indicator for which output ports 56 should 
use Best Effort in transmitting the packet. Best Effon implies that no guarantee on the 
5 packet's transmission or QoS is provided. Those skilled in the art will easily recognize 
that the invention applies equally well to other QoS as well. 

The entry may also indicate whether a new tag should be applied to an 
outgoing packet when, for example, whether routing between VLANs requires an 
outgoing tag different from the incoming lag, and what that tag should be, if necessary'. 
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The entry also contains information relating to source and destination aging. 
Source aging information indicates whether the source is active or not. In a preferred 
implementation, this information is updated by the foPA'arding logic 52 e\'ery time the 
layer 2 source address is matched. The information implements in accordance with 
5 IEEE standard 802. Id type address aging. Destination aging in the network element 12 
indicates which layer 2 and layer 3 entries are actiye. The information for an entry is 
updated ever\' time an entry is matched, either by a layer 2 destination search or a 
layer 3 match cycle for the entry. 

The entry also provides for whether layer 2 results should be used for 

10 fonvarding by the input port 50i. As mentioned above, the layer 2 information may be 
forced for a layer 3 entry but in addition to the layer 2 forwarding information, layer 3 
functionality may be added to the layer 2 forwarding. >^ 

The entry may also define a static entry. A static entry is not subject to layer 2 
leaming and is never aged. 

15 Entries for layer 3 may include additional information. The entry may indicate 

that only the first 64 bytes of the packet should be sent to the processor 32 for 
subsequent processing. The entry may indicate whether the packet is part of a multicast 
routing. If so. then the output pen 50i should decrement the header checksurn, forward 
the packet to the indicated output ports 56, and indicate that the output pon 56i need 

20 to replace the layer 2 source address of the packet the output pon 56i's MAC address. 
Other types of header modifications will be readily apparent to those skilled in the art 
to implement proper routing. 

The entry in the associated memory 42 may also include the next hop 
destination address to be used to replace the incoming destination in unicast routing. In 

25 a unicast route, the incoming packet would have had its destination address as the 
multi-layer network element 12. 

The merge logic 66 must wail for the results of the searches of the forwarding 
memory 40 done by the L2 logic 62 and the L3 logic 64. In the preferred embodiment, 
the layer 2 and layer 3 information are stored in the same forwarding memory 40, 

30 however, they could be stored in separate memories. As stated earlier, the preferred 
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embodiment has the forwarding memory 40 limited to storing the information used by 
the L2 and L3 logics that match the fields of the key to reduce the size of the 
for^varding memory. As such, the associated memory 42 stores additional information 
about the entries. Each enir>' in the forwarding memor>' 40 points to a corresponding 

5 entry in the associated memory 42, the contents of which the associated memory 42 
provides to the merge logic '66 to makes its forwarding decisions. 

Fig. 7 illustrates the steps occurring in the forwarding logic 52. While the Fig. 
7 illustrates the preferred embodiment of the operation of the fonvarding logic 52, 
those skilled in the art will immediately recognize other equivalent ways to accomplish 

10 the same task. Information is received at the forwarding logic 52 from the input port 
50 (step 200). On one path, the L2 logic 62 determines the necessary information for a 
layer 2 search and carries out that search against the forwarding memory 40 (step 202). 
The L2 logic 62 and forwarding memory 40 determine in step 204 whether there was a 
matching entry for the source of the packet (step 204). If the source address is not in 

15 the forwarding memory 40. then the source address is learned (step 206). To learn the 
source address, the L2 logic 62 and the forwarding memory 40 ensure that an entry is 
placed in the for\varding memory. A signal is sent to the processor 32 to examine the 
new int~ormaiion. 

If the source address was already in the forwarding memor>* 40 and matched to 
20 the input port 50 of arrival, then the L2 logic 62 attempts to match the destination 

address to the forwarding memory 40 (step 208). If the source address was not in the 
fonvarding memory 40 or the source address was in the memory but at a different 
pon, then the source address and port combination is learned in step 206 prior to 
attempting a destination search in step 208. 
25 In the other path from step 200, the class logic 60 determines the class in step 

210. After the class logic 60 has determined the class and passed this onto the L3 logic 
62, the L3 logic attempts a match against the forwarding memory for the layer 3 entry 
(step 212). 

In step 214, the merge logic 66 uses information from the L2 search of step 
30 208, if there was one, the class logic results from step 210, and the layer 3 search 
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results from step 212 to make the appropriate forwarding decisions based on the 
criteria of Fig. 6. Once the merge logic 66 has determined the appropriate forwarding 
decision in step 214, the results are passed to the output pon 50i (step 216). 

Fig. 7 illustrates the flow proceeding down two paths. Because the layer 2 and 
5 layer 3 searches are independent, everything but the actual memory search may be 

pipelined or accomplished in parallel. In a preferred implementation, the processing by 
the class logic 60, the L2 logic 62, and L3 logic 64 may proceed in a parallel or 
pipelined fashion except where dependencies prevent such action. For example, the L3 
logic 64 requires the output of the class logic 60 to create the search key for the layer 
10 3 search and the merge logic 66 requires that the layer 2 and layer 3 searches be 
finished to merge the results according to Fig. 6. 

in another embodiment, however, the L2 information and the L3 information 
may be in separate memories. In this case the L2 and L3 searches rnay occur 
simultaneously. 

15 After the merge logic 66 determines the actions on the packet, the input port 

50i makes write requests to the packet manager 54 if the packet is not to be filtered, or 
dropped. The packet need not be received in its entirety before the input port 50i 
makes write requests to the packet manager 54. The input port 50i passes to the packet 
manager 54 the address where the incoming poaion of the packet is to be stored, the 

20 number of output ports 56 that the packet will be output, the priority of the packet, and 
then delivers the pointers to the appropriate output port(s) 56. The input pon 50i 
receives pointers to t^ree memory locations in the packet buffer memory 44 where the 
packet may be placed. Preierably, the input pon 50i has obtained a pointer from the 
packet buffer manager 54 prior to making write requests. 

25 The output pon 56i stores the pointers in output queues for packet transmission. 

When a queue presents a pointer for transmission, the output port 56i requests the 
contents stored at the pointer address from the packet manager 54 and transmits the 
contents out of the multi-layer network element 12 on the corresponding network 
element port 38. The packet manager 54 keeps track of whether all of the output port 
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56 usina a particular pointer have transmined ihe contents associated with that pointer, 
if so the memory space is freed for future use. 

Packets in the network element 12 are buffered at each output port 56i before 
the packet is transmined across the physical medium to the next or fmal destination. 

5 Queueing both at the input port 50i and the output pons 56i are based on pointers. 
Each of these pointers points to a storage location in the packet buffer memory 44 
where the packets are stored. The pointers are passed from the input pon 50i to the 
appropriate output ports 56. Each output pon 56i requests the contents of the pointed- 
lo location from the packet memory manager 54 when a packet is to be transmitted. 

10 For multicast packets, only one copy of the packet is kept in the packet buffer memory 
44 along with a count of the number of output pons 56 to which the packet has to be 
sent. 

Each output pon 56i has a plurality of output queues Qi. In the preferred 
embodiment, each of the output pons 56i includes three queues. However, the concepts 
15 embodied in this invention are not limited to any panicular number of output queues. 
While one skilled in the an will recognize that hardware-implemented queues can be 
implemented in various way, the preferred embodiment includes, at each output pon 
56i, a single physical queue and this is divided into n logical queues, preferably 3. 

Fig. 8 illustrates a more detailed view of an output pon 56i including a logical 
20 view of the output queues Qi. Fig. 8 illustrates output queues Ql ... Qi ... Qn. Each of 
the queues Qi is connected to a transmitting logic 300 that transmits the packets 
pointed to by the queue Qi when instructed. 

Each queue has a pair of pointer registers to indicate ihe beginning and the end 
of the queue. For each queue Qi, Qistan stores the location of the beginning Qi; and 
25 Qiend stores the location of the end of the queue Qi. 

The maximum number of pointers that is allowed per output pon 56i is limited 
by the hardware storage, and in the preferred embodiment this is IK. The concepts 
embodied in the invention, however, are not limited to any panicular maximum 
number of storage locations. While the maximum number of pointer storage locations 
30 is limited for a particular implementation, the size of the queues that share the 
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maximum number of storage locations is variable. For example, in the preferred 
embodiment: the number of storage locations in the physical queue is limited to IK, 
but the size of the n queues themselves need only aggregate to IK. In this way, each 
of the logical queues Qi may be of different sizes. 
5 The relative distribution of the IK storage locations among the logical queues. 

Qi, is programmable on a per-ouipui-port 56i basis by the processor 32. The relative 
distribution can be changed by the processor 32 at any time (depending on the traffic 
flow) and the changes will take effect as soon as the affected queue regions are empty 
and the pointers can be reassigned. 
10 Providing multiple output port queues per output port 56i enables traffic 

mapping to required quality of service (QoS) type functions and other factors. The 

network element I2's flexibility in queueing is given by prograqimabiliry on a per- 

t. 

output-pon 56i basis for: (1) classification into the number of queues Qi at output port 
56i; (2) scheduling transmission from the queues; and (3) Qi behavior upon congestion. 

i5 Classification of packets into different queues results from global priority 

information output to the input pon 50i by the forwarding logic 52, which the input 
port 50i passes to the output pon 56i. Global priority information is associated with 
each packet and is present as a part of the associated data in the associated memory 
42. Global priority information may be mapped from the priority information present 

20 in the VLAN tags and would be based on the IEEE 802. 1 Q standard. 

Output pons 56i use the global priority information to determine to which 
queue Qi a given packet will be forwarded as long as Force BE (Best Effort) 
information, also in the associated memory 42 and associated with the packet, does not 
indicate a Best Effort override of the global priority information. If the Force BE 

25 information does indicate an override, then that packet will be sent to the low priority 
queue. The implementation of the Force BE allows forcing Best Effort on a per-outpui- 
port 56i basis. Preferably, this is done by having a Force BE field in the associated 
memory 42 associated with an entry, the field having an indicator for each output port 
56i. In the preferred embodiment, this is implemented using a single bit per output pon 

30 56i. 
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Included in the output port 56i is a mapping logic 302 that translates the global 
priority informaiion into a queue selection signal for storing the pointer from input pon 
50i- 

The global priority information is contained in a associated memory 42 entry 
field of three bits. The three bits are passed to the mapping logic 302 which outputs 
the queue selection signal. The three global priority bits associated with the packet 
entering the output port 56i from the buffer memory 44 are mapped by the mapping 
logic 302 into two local priority bits by the mapping logic 302, in order to determine 
the appropriate output queue for the packet, and then the mapping logic 302 generates 
the queue selection signal. This mapping is determined by two programmable queue 
priority threshold values found in the mapping logic 302. A first programmable priority 
threshold register PTRl stores a first threshold value and a second programmable 
priority threshold register PTR2 stores a second threshold value. The mapping is 
programmable by the processor 32 by changing the values in 'the threshold registers, 
PTR1,PTR2. 

The mapping of the three-bit global priority into local two-bit priorit>' using the 
threshold values for Ql to Qn, where n=3, is as follows: 

If p < PTRl then global priorir\- = 01 

If p >= PTRl and p < PTR2, then global priority = 10 

If p >= PTR2, then global priority = 1 1 
where p is the value of the global priority field. 

The local priority maps to the three output queues as follows: 

00 -'- Unused (Reserved) 

01 Ql > Low priorit>\ BE queue 

10 Q2 

1 1 Q3 > High priority queue 

While the mapping above has been described for the preferred embodiment, those 
skilled in the an would readily recognize variations that would accomplish the result of 
mapping any global priority associated with a packet to multiple number of queues in 
an output port 56i. 
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The output port 56i also includes a scheduler 304. The goal of the scheduler 
304 is to allocate fixed rates to each queue Qi for transmission within output port 56i. 
Because the scheduler 304 is associated with a panicular output pon 56i, the network 
element 12 is capable of programming different schedules for each output pon 561. 

5 In the preferred embodiment, the nerwork element 12 supports both strict 

priority and weighted round robin priority schemes. Each queue Qi in each output port 
56i has associated with it three programmable registers which contain the weights to be 
used for their associated queue. 

For the sake of simplicity, the following discussion and Fig. 8 illustrate 

10 relationships only for queue Qi. The discussion applies directly to all of the queues. 
Also, throughout this discussion we assume three queues although, as mentioned 
above, the extension of this scheme to more queues is straightforward. 

In implementing the strict priority scheduling method, the scheduler 304 will 
not service the lower priority queues as long as there are packets in the higher priority 

15 queues. This implies that the highest priority queue can potentially starve the lower 

priority queues. The network element 12 also provides a weighted round robin method 
as an alternative. 

In the weighted round robin method, each queue Qi has an associated weight, 
stored in an associated weight register, Wi, storing the number of packets to be 

20 transmitted during a round. The scheduler 304 polls each queue Qi and serves Wi 

packets before ser\'icing queue Q(i+1), the next queue, in the round. While illustrated 
as a number of packets, a given weighted Wi is also envisioned alternatively as the 
number of bytes. This scheme anempis to give each queue Qi a rate proportional to its 
respective weight. However, there is a trade-off betAveen the flexibility and the packet 

25 delay associated with queue Qi, since a large range of weights leads to large service 
cycle times, and consequently large worst-case packet delays. 

To achieve better performance, the scheduler 304 runs a weighted round-robin 
scheme according to a frame structure. The scheduler 304 attempts to enforce the rates 
over several polling rounds that comprise a frame. Frame length can be made much 
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smaller than the cumulative sum of the weights of the queue Qi within the output pen 
56i. This has the advantage of reducing the worst-case delay. 

As an illustrative example of the framing method, wherein the output pon 56i 
includes three queues Qi, if queue Ql has a weight of 2, queue Q2 has a weight of 4, 
5 and queue Q3 has a weight of 6, a normal round would service 6 packets of queue Q3, 
then 4 packets of queue Q2, and finally 2 packets of queue Q3. Using the frame 
approach, a round could be made up of two frames. Accordingly, the round would 
service frame 1, i.e., service 3 packets of queue Q3, 2 packets of queue Q2, and I 
packet of queue Ql, and then frame 2, i.e., service 3 packets of queue Q3, 2 packets of 

10 queue Q2, and I packet of queue Ql. As mentioned above, the choice of the number 
of frames is programmable and based on desired results. 

Packets are serviced non-preemptively which means that a packet's transmission 
is not interrupted. A transmit register, TXi, is associated for each Qi. The TXi registers 
hold the number of bytes that can be transmitted from this queue in the current round, 

15 or frame, if framing is being used. 

The scheduler 304 services queue Qi and decrements the transmit register TXi 
register according lo the number of bytes transferred until the value of this register is 
less than or equal lo zero. The scheduler 304 then starts processing queue Q(i+1) in 
the round, or frame, and also updates the TXi register of just serviced queue Qi by 

20 adding to the register TXi a quantum of bytes as represented by the value in the W'i 
register- The TXi registers count, below zero in order to align the number of bytes 
transmitted to the packet boundar>-. That is, a queue Qi may tinish transmitting a 
packet even if the number of bytes to finish transmitting the packet causes the value in 
the TXi register to drop below zero. This mechanism allows the scheduler 304 to take 

25 into account for queue Qi in the subsequent round, or frame, any overrun in the current 
round, or frame. When the value in the Wi register is added to the value in the TXi 
register, the number of packets that queue Qi may transmit during the next round, or 
frame, is reduced by the amount queue Qi went over its allocation for the current 
round, or frame. 
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Congestion may occur in the network element 12 when resources are not 
available for a packet. Some of the resources necessary to switch packets are input 
buffers, space in the packet buffer memory 44, and output port 56 queue locations. 
Input buffers, are assigned to the packet upon arrival at the network element 12, while 
5 output port queue entries are assigned when moving packets into output pons for 
transmission. If the input buffers are not available for storing the packet at the input 
port 50 or no pointers are available to store the packet in the packet buffer memory 44, 
the packet will be discarded. A congestion logic 306 handles congestion in the output 
port 56i for each queue Qi. 

10 If an output port queue Qi is full, packets are discarded by not storing the 

pointer in queue Qi, however, the other queues may not be full. Wailing for the queue 
Qi to become full before dropping packets may not be desirable as this leads to tail 
drop behavior. Also, if only packets are discarded when the queue Qi is full, and if 
one flow having previously negotiated a particular QoS is exceeding its negotiated 

15 parameters while others are not, it is likely that well-behaved packets may be dropped 
continuously while the misbehaved flow will be transmitted. 

To balance all the flows going out of a particular queue Qi, discarding of 
packets may begin even before the queue Qi is full. Each queue Qi has an associated 
coneestion register Ci that holds a threshold value which is less than the queue size. 

20 When the number of queue Qi entries reaches the threshold value, a discard policy is 
applied. Additionally, in the preferred embodiment, when the queue Qi becomes full, a 
"queue full" interrupt is generated. 

The queues Qi may ai some time contain pointers to packets that are part of a 
plurality of negotiated flows, such as those set up using a negotiated service-based 

25 protocol, such as RSVP. If admission control is properly done by the processor 32 in 
setting up the entries in the forwarding memory 40 and associated rnemory 42, and if 
all the flows are generally following their traffic specifications, a queue-full interrupt 
suggests that one of the flows is misbehaving and exceeding its assigned reservation. 
In an attempt to detect misbehaving flows, all the flows that are destined for 

30 ' the particular queue Qi causing the queue full interrupt are monitored one at a time to 
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ensure that they are confirming to their reservations. This scheme detects misbehavior 
of flows over a period of time. The processor 32. in response to a queue-full interrupt, 
sets a count indicator in the associated memory 42 for an entry directing packets to the 
output poa 56i associated with the queue Qi. The processor 32 uses its knowledge of 

5 the contents of the forwarding memory 40 and the associated memory 42 to determine 
which flows, that is, entries in the associated memory 42, are directing packets to any 
queue Qi. Alternatively, desired aggregate flow may be counted by setting the count 
indicators multiple entries in the associated memory 42. 

The indicator causes a shared packet count register PCR 67 (illustrated in Fig.4) 

10 to be incremented every lime the entry in the associated memory 42 is accessed, that 
is. every time a packet associated with that flow arrives at the network element 12 for 
which the count indicator has been set. The PCR is shared am^ng all output pons 56. 
A single PCR 67 allows options for packet counting. For example, if the count 
indicator is set in only one entry in the associated memory 42',' the PCR 67 will count 

15 only those packets matching that entry. The processor 32 could also set several entries* 
count indicators so that the PCR 67 may aggregate the packet counts for as many 
entries as have their count indicators set. 

The processor 32 continues selecting flows for counting until the processor 32 
finds the flow exceeding its parameters or having an unusual traffic pattern that might 

20 have caused the queue Qi to fill. When the misbehaving flow is found, the processor 
32 causes the Force BE indicator (described earlier) in that entry in the associated 
memoPr- 42 for that panicuiar output pon 56i to be set. In practice, this implements a 
strategy i^or punishing (i.e., assigning a lower priority to a misbehaving flow) a flow 
that is exceeding its negotiated value, so that other packets using the output port 56i 

25 are not adversely affected by the misbehaving flow. Since the misbehaving flow's 

packets are sent to the lowest-priority queue, the BE queue, the chances that its packets 
will be dropped is increased. If the best-effort queue, i.e., the queue Qi having the 
lowest priority, overflows, then the packets are discarded. 

The congestion logic 306 uses a Random Early Discard (RED) algorithm to 

30 randomly discard packets attempting to enter the queue Qi after the queue's 
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programmable threshold register Ci associated with each queue Qi is met or exceeded. 
The discard policy is preferably applied separately to each queue Qi. The processor 32 
programs the marking probability of the discard algorithm depending on the traffic 
class and the flow. When the number of packets queued in queue Qi meets or exceeds 
5 the number is the register Ci, the packets entering queue Qi are randomly dropped. The 
dropped packets may be transmitted to the processor with an indication that the 
processor 32 should analyze the situation. 

When a sufficient number of packets from the same fiow is being dropped, 
there is a higher than normal probability that that flow is sending excess traffic. The 
10 priority of such a flow is lowered via the Force BE field described earlier. 

In addition to providing forced priority levels via the BE Field in the associated 
memory 42, the entries in the associated memory 42 (in conjunction with their 
corresponding entry in the forwarding memory 40) may be configured to support 
different QoS. Various situations may call for desired QoS resuhs. One example is 
15 allowing some layer 2 traffic within the subnet or across subnets to be given a higher 
priority. This might be done for traffic from a high-end server, for example. 

QoS may also be configured for application specific traffic using a signalling 
protocol (such as RSVP), or based on some other criteria. The network element 12 
preferably supports an approximation of several traffic types as defined by Internet 
20 Engineering Task Force (IETF) Integrated-Services W orking Group. Traffic or flows 
which do not have any reservation or QoS associated with them are served as 
best-effon traffic. One skilled in the art will easily recognize how to apply the 
concepts of the invention to other traffic types. 

In the preferred embodiment, the switching element 36 and all of its 
25 constituents, the foru'arding memory 40, and the associated memory 42 all are 
implemented in hardware. 

In an alternate preferred embodiment, the switching element 36 and all its 
constinjents are implemented in hardware on an application specific integrated circuit. 
Equally contemplated, an integrated circuit could contain a hardware implementation of 
30 switching element 36, and any combination or ponion thereof, of the processor 32, the 
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processor memory 34, the forwarding memory 40. the associated memory 42, and the 
packet buffer memory 44, 

A multi-layer network element has been described that combines the features of 
quick layer 2 bridge-t\pe forwarding and combines it with the added functionality of 
5 layer 3 routing and QoS support to create an apparauis and method of its use to 

perform both layer 2 and most layer 3 forwarding decisions prior to the receipt of the 
next packet. 

The foregoing description of the preferred embodiments of the multi-layer 
network element has been presented for purposes of illustration and description. It is 

10 not intended to be exhaustive or to limit the invention to the precise fonn disclosed, 
and modirication and variations are possible in light of the above teachings or may be 
acquired from practice of the invention as disclosed. The embodiments were chosen 
and described in order to explain the principles of the invention and its practical 
application to enable one skilled in the an to utilize the invention in various 

15 embodiments and with variation modifications as are suited to the particular use 

contemplated, h is intended that the scope of the invention be defined by the claims 
appended hereto, and their equivalents. 
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CLAIMS 

; What is claimed is: 

5 

1. An apparatus for detecting and handling queue congestion in an output 
port of a multi-layer network element comprising: 
a central processor unit (CPU); 

a switching element coupled to the CPU and configured to output 
10 packets to a network through the output port, the switching element including: 

at least one output queue having storage locations for packet 
pointers, each pointer configured to point to portions of a packet to be transmitted on 
the network, associated with the output pon, and wherein the number of storage 
locations is variable; 

15 a Stan register configured to stored a pointer to the storage 

location at the front of the queue; 

an end register configured to store a pointer to the storage 
location at the end of the queue as determined by the number of storage location; 

a next-free register configured to store a pointer to the next 
20 available storage location, wherein packet pointers are stored in the output queue 

beginning at the location pointed to by the start register and the next-free register is 
incremented as the next available storage location mo\'es toward the second pointer; 

a programmable threshold register configured to store a threshold 
pointer to a storage location between the location represented by the stan register and 
25 the location represented by the end register; 

threshold logic configured to output a congestion signal when the 
value in the next free register represents a storage location logically located between 
the location pointed to key the threshold register and including the storage location 
pointed to by the end register; 
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random discarding logic configured to randomly select packets to 
discard in response to the congestion signal, so that once the threshold is exceed, 
incomine packets are randomly discarded, using a packet discarding algorithm, such as 
random early discard, a well known algorithm; 
5 capacity logic configured to output a queue full signal to the 

CPU when the value in the next free register is equal to the value in the end register; 

a memory having at least one entry configured to store 
information about forwarding decisions for the packet, wherein the entry is adapted to 
indicate whether packets associated with that entry should be counted; 
10 memory access logic configured to access the entry when an 

incoming packet associated with that entry arrives at the switching element; 

a packet counter configured to count the numb'er of times the 
entry is accessed, to represent an entry bandwidth; 

a computer program mechanism coupled to the CPU configured to compare the 
15 contents of the packet counter to a reservation-based protocol negotiated value for 

lowering a priority of any future packet associated with the entry and destined for the 
output queue. 

20 
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10 



2. An apparatus for handling multiple priorities for a multicast packet 
being output from a network element on at least rvvo output pons comprising: 

at least a first output queue and a second output queue, first output 
queue having a priority higher than the second output queue, at each output port: 

a memory configured to output forwarding information about the 
multicast packet in response to a memory access based in pan on a multicast address 
of the multicast packet, the forwarding information including priority information 
indicating to which output queue at each output pon the multicast packet will be 
directed. 



3. The apparaois of claim 2. further including: 

a central processing unit coupled to the memory^; 

a computer program mechanism coupled to the central processing unit 
configured to modify the priority information based on an amount of packets being 
15 transmitted through one of the output ports. 

4. The apparatus of claim 2, further including: 

a central processing unit coupled to the memory; 

a computer program mechanism coupled to the central processing unit 
20 configured to modify the priorit\' information based on information communicated 
between the ner^vork element and an intended recipient of the multicast packet. 



25 



40 



wo 99/00949 



PCTAjS98/13364 



5. An apparatus for queue scheduling in a network element comprising: 

at least one output port configured to output packets, each packet having 

a byte length; 

at least two queues associated with each output port, configured to 
5 queue packets to be output at each output pon; 

a weight register associated with each queue and adapted to receive a 
value representing a weight number; 

weighting logic for generating the weight number for each queue; 
transmitting logic at each output port configured to transmit packets 
10 identified in each queue according to a queue select signal and responsive to a done 
signal; 

scheduling logic at each output pon configured m select one of the 

queues and to generate the queue select signal to the transmining logic to indicate 

which queue will be transmitting; 
15 counter logic, at each output port, associated with the counters 

configured to decrement the weight register equal to a number of bytes transmitted by 

the transmitting logic; 

zero logic configured to transmit the done signal when the number in 

the counter represents zero; 
20 reloading logic configured to determine the number of packets 

transmitted after the done signal and to place in the weight register a value equal to the 

weight number minus the number of packets transmitted after ihe done signal. 

6. The apparatus of claim 5, wherein the scheduling logic is configured to 
25 respond to the done signal and the transmitting logic to select a next transmitting 

queue. 

7. The apparatus of claim 5, wherein the scheduling logic is configured to 
select a next transmitting queue at a time prior to the zero logic generating the done 

30 signal. 
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8. An apparatus in a network element that is adapted, to transmit a packet 
to multiple recipients and includes services for reservation-based protocols, for 
handling multiple priorities, the apparatus comprising: 

at least two output ports, one associated with each of the multiple 
5 recipients, each of the output ports having at least a first output queue and at least a 
second output queue, the first output queue having a priority higher than the second 
output queue, at each output port; 

a memory configured to output forwarding information about the packet 
in response to a memory access based in part on a header of the packet, the forwarding 
10 information including priority information indicating to which output queue at each 
output port the packet will be directed. 

>^ 

9- The apparatus of claim 8, further including: 

a central processing unit coupled to the memoiy; 
15 a computer program mechanism coupled to the central processing unit 

and configured to modify the priority information based on an amount of packets being 
transmitted through one of the output ports. 

10- The apparatus of claim 8, further including: 

20 a central processing unit coupled to the memor>'; 

a computer program mechanism coupled to the central processing unit 
and configured to modify the priority information based on reservation-based protocol 
information communicated between the network element and an intended recipient of 
the multicast packet. 

25 
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